Filling in NAs with last non-NA value
Problem
You want to replace NA's in a vector or factor with the last non-NA value.
Solution
This code shows how to fill gaps in a vector. If you need to do this repeatedly, see the function below. The function also can fill in leading NA's with the first good value and handle factors properly.
# Sample data x <- c(NA,NA, "A","A", "B","B","B", NA,NA, "C", NA,NA,NA, "A","A","B", NA,NA) # NA NA "A" "A" "B" "B" "B" NA NA "C" NA NA NA "A" "A" "B" NA NA goodIdx <- !is.na(x) # FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE # These are the non-NA values from x only # Add a leading NA for later use when we index into this vector goodVals <- c(NA, x[goodIdx]) # NA "A" "A" "B" "B" "B" "C" "A" "A" "B" # Fill the indices of the output vector with the indices pulled from # these offsets of goodVals. Add 1 to avoid indexing to zero. fillIdx <- cumsum(goodIdx)+1 # 1 1 2 3 4 5 6 6 6 7 7 7 7 8 9 10 10 10 # The original vector with gaps filled goodVals[fillIdx] # NA NA "A" "A" "B" "B" "B" "B" "B" "C" "C" "C" "C" "A" "A" "B" "B" "B"
A function for filling gaps
This function does the same as the code above. It can also fill leading NA's with the first good value, and handle factors properly.
fillNAgaps <- function (x, firstBack=FALSE) { ## NA's in a vector or factor are replaced with last non-NA values ## If firstBack is TRUE, it will fill in leading NA's with the first ## non-NA value. If FALSE, it will not change leading NA's. # If it's a factor, store the level labels and convert to integer if (is.factor(x)) { lvls <- levels(x) x <- as.integer(x) } goodIdx <- !is.na(x) # These are the non-NA values from x only # Add a leading NA or take the first good value, depending on firstBack if (firstBack) goodVals <- c(x[goodIdx][1], x[goodIdx]) else goodVals <- c(NA, x[goodIdx]) # Fill the indices of the output vector with the indices pulled from # these offsets of goodVals. Add 1 to avoid indexing to zero. fillIdx <- cumsum(goodIdx)+1 x <- goodVals[fillIdx] # If it was originally a factor, convert it back if (exists("lvls")) x <- factor(x, levels=1:length(lvls), labels=lvls) x } # Sample data x <- c(NA,NA, "A","A", "B","B","B", NA,NA, "C", NA,NA,NA, "A","A","B", NA,NA) # NA NA "A" "A" "B" "B" "B" NA NA "C" NA NA NA "A" "A" "B" NA NA fillNAgaps(x) # NA NA "A" "A" "B" "B" "B" "B" "B" "C" "C" "C" "C" "A" "A" "B" "B" "B" # Fill the leading NA's with the first good value fillNAgaps(x, firstBack=TRUE) # "A" "A" "A" "A" "B" "B" "B" "B" "B" "C" "C" "C" "C" "A" "A" "B" "B" "B" # It also works on factors y <- factor(x) # <NA> <NA> A A B B B <NA> <NA> C <NA> <NA> <NA> A A B <NA> <NA> # Levels: A B C fillNAgaps(y) # <NA> <NA> A A B B B B B C C C C A A B B B # Levels: A B C
Notes
Adapted from na.locf()
in the zoo library.